专利摘要:
Continuous network segmentation based on delayed deep deterministic policy gradient (DDPG) defines at least two network segments in a central unit (CU) of a 5G network and, in different periods, identifies a state of each segment, determines, to Starting from a policy of learning by reinforcement of a critical model and an actor policy assigned to the segments an escalation operation assigning different computing resources to corresponding virtual network functions (VNF) in the CU for the segment from the state. The scaling operation is applied to the CU and a cost result is monitored. The result is compared to a predetermined optimal result. Next, gradients are calculated for the actor and critic models considering a maximized reward from a difference between the results. It is worth mentioning that the gradient for the actor policy is applied at a less frequent rate over time than the application of the gradient for the critic model. (Machine-translation by Google Translate, not legally binding)
公开号:ES2889699A1
申请号:ES202030731
申请日:2020-07-15
公开日:2022-01-12
发明作者:Christos Verykoukis;Loizos Christofi;Farhad Rezazadeh;Hatim Chergui
申请人:Ebos Tech Ltd;
IPC主号:
专利说明:

[0002] Continuous network segmentation in a 5G mobile communications network through a delayed deep deterministic policy gradient
[0004] BACKGROUND OF THE INVENTION
[0006] [0001] Field of the invention
[0008] [0002] The present invention relates to the field of mobile data communications and, more specifically, to network slicing in a fifth generation (5G) mobile telecommunications network.
[0010] [0003] Description of Related Art
[0012] [0004] Mobile data communications refers to the exchange of data traffic in a mobile telecommunications network. Digital mobile data communications requires the presence of an underlying physical data communications infrastructure layered on top of a cellular network, such as that demonstrated first by second-generation digital mobile communications and more recently by the Internet. of considerably more robust and reliable fourth-generation (4G) long-term evolution (LTE) mobile data communications. In 4G LTE, the network architecture allows connectivity from user equipment (UE) to different base stations (eNB) grouped in different radio access networks (RAN) being each RAN attached to the core network (CN).
[0014] [0005] The eNBs send and receive radio transmissions to each UE by utilizing the analog and digital signal processing functions of the LTE air interface through different Multiple Input Multiple Output (MIMO) antenna groups. acronym in English). Each eNB also controls the low-level operation of each attached UE by sending signaling messages to the UE, such as handover commands. Finally, each eBN is linked to the CN, also known as the Evolved Packet Core (EPC), via an S1 protocol stack interface. It should be noted that each eBN can also communicatively couple to another neighboring eBN via an X2 interface, to enable signaling and packet forwarding during eBN-to-eBN (cell-to-cell) UE handover. The EPC, in turn, is a framework for providing integrated voice and data on the 4G LTE network. While the 2G and third-generation (3G) network architectures process and exchange voice and data across two independent circuit-switched (CS) sub-domains for voice and packet-switched (PS). For data, the EPC unifies voice and data into an Internet Protocol (IP) services architecture, and voice is treated like another Internet Protocol (IP) application.
[0016] [0006] While 4G represented a giant leap in performance over 2D and 3G networks, 5G represents a huge improvement over 4G. By leveraging massive MIMO antenna arrays at each base station, utilizing millimeter-wave radio communications, beamforming for direct wireless communications with individual UEs, and with a forked Centralized Unit (CU) and a distributed unit (DU) architecture, 5G can achieve a data exchange capacity of nearly thirteen terabytes, which is nearly a twenty-fold improvement over 4G LTE. The CN of the 5G architecture reflects a substantial change compared to the EPC of 4G. In the 5G CN, the changes have been reduced, in an abstract way, to what has been called the "four modernizations". The first is "information technology" or "IT", the second is "Internet", the third is "highly simplified" and the fourth is "service-based". The most typical change in the network architecture of the CN is the service-based network architecture of the CN to separate the control plane from the user plane. Other technologies enable network segmentation and edge computing.
[0018] [0007] In relation to IT modernization, the essential feature of the 5G architecture is the notion of Network Functions Virtualization (NFV). NFV disconnects software from hardware by replacing various network functions such as firewalls, load balancers, and routers with virtualized instances running as software . This eliminates the need to invest in many expensive hardware items and can also speed up installation times, so that revenue-generating services are delivered to the customer more quickly. NFV enables 5G infrastructure by virtualizing applications on the 5G network. This includes network slicing technology that allows multiple virtual networks to run simultaneously. NFV can address other 5G challenges through virtualized compute, storage, and network resources that are customized based on applications and customer segments.
[0019] [0008] Some have called network slicing the "key ingredient" of 5G, enabling the full potential of the 5G architecture to be realized. Network slicing adds an additional dimension to the NFV domain by allowing multiple logical networks to run simultaneously on a shared physical network infrastructure. By definition, network slicing becomes an integral part of the 5G architecture by creating end-to-end virtual networks that include both network and storage functions. Operators of a 5G network can then effectively manage various 5G use cases with different demands for specific throughput, latency, and availability by dividing network resources to multiple users, or "tenants." Through strategically tuned network segmentation and optimized allocation of virtual network function (VNF) cases, the cost of running a network with 5G architecture can be optimized.
[0021] [0009] With regard to optimizing the configuration of different network segments, fully automated and hands-free operations and management have become critical to exploit the potential gain of dynamic resource allocation in a network segment with NFV enabled. To that end, many have proposed autonomous management and orchestration of VNFs, where the CU "learns" to reconfigure resources, deploy new VNF instances, or offload jobs to a central cloud. A noteworthy proposal concerns the duplicate Parameterized Action Twin (PAT) Deep Deterministic Policy Gradient (DDPG) of reinforcement learning based solution (DRL), which uses the critical actor method to learn how to provision network resources to online VNFs, given the current network state and the requirements of deployed VNFs.
[0023] [0010] It is worth noting that PAT's DDPG solution achieves better results than all reference DRL plans, as well as heuristic expansive allocation in a variety of network scenarios. However, although DDPG is capable of providing excellent results, it also has drawbacks. Like many reinforcement learning algorithms, DDPG training can be unstable and highly dependent on finding the correct hyperparameters for the task at hand. This occurs because the algorithm continually overestimates the Q values of the critical (value) network. These estimation errors accumulate over time and can cause the agent to fall into a local optimum or suffer catastrophic forgetting.
[0025] BRIEF SUMMARY OF THE INVENTION
[0027] [0011] Embodiments of the present invention address deficiencies in the art with respect to network slicing in a 5G network and provide a novel and non-obvious method, system, and software product for network slicing. continues by using a delayed DDPG. In an embodiment of the invention, at least two network segments are defined in a CU of a mobile communication network with 5G network architecture. Then, at different periods, a network segmentation function identifies a state of each of the network segments and determines, from a reinforcement learning policy assigned to one of the network segments, during a contemporary period of periods, a scaling operation in the assignment of different computing resources to corresponding VNFs in the CU for one of the network segments from the identified state of one of the network segments. The network segmentation function also applies the determined scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to the network segment.
[0029] [0012] It should be noted that the reinforcement learning policy includes an actor policy and a critic model. The actor policy takes into account the state of one of the network segments as input and delivers, as output, a given scaling operation by allocating different computing resources to corresponding virtual network functions (VNFs) in the CU for one of the segments. from the identified state of one of the network segments. The critical model, in turn, takes into account the state of one of the network segments in combination with the determined scaling operation as input and delivers a statistical Q value as output. Optionally, the critical model may be represented by an amalgamation of twin critical models with the statistical Q value being a minimization of the individual Q values produced by each of the twins. It should be noted that both the actor policy and the critic model can be implemented according to a deep neural network that learns by itself from feedback applied to the network.
[0031] [0013] Once the scaling operation has been applied, the network segmentation function monitors a resource cost result of the scaling operation determined in the CU and compares the monitored result with a predetermined optimal result for the CU. given scaling operation. Next, the network segmentation function determines a statistical Q value in the critic model from a difference between the monitored result and the optimal result and calculates a gradient for the actor policy and the critic model considering the Q value determined statistic. Finally, the network segmentation function applies the computed gradients to the actor policy and critic's network model for use in a given next scaling operation at a later period of the periods. However, the network segmentation function applies the calculated gradient for the actor policy at a less frequent rate than an application of the calculated gradient for the critic model.
[0033] [0014] In one aspect of the embodiment, the determined scaling operation is determined by taking into account a state space of the one of the network segments, the state space including a number of new UE connections to the one of the segments of network, computing resources assigned to each of the VNFs in the CU for the corresponding network segment of the network segments, a state of delay with respect to the cost of latency for each of the network segments, a state of energy with respect to an energy cost for the use of computing resources by each of the network segments, a number of users served in each of the network segments and a number of VNF customizations in each of the network segments. net. In another aspect of the embodiment, the scaling operation is part of a scale-up action space that includes scaling to a larger capacity on the one of the network segments and scaling to a smaller capacity on the one of the network segments. net. In yet another aspect of the embodiment, the optimal result includes a maximized inverse of a total network cost of the monitored result in the contemporaneous period of periods.
[0035] [0015] In another embodiment of the invention, a C-RAN architecture data processing system can be adapted for continuous network segmentation by using a delayed DDPG. The system includes a host computing platform arranged in a CU of a mobile communication network with 5G network architecture. The system also includes a continuous network segmentation module based on delayed DDPG. The module includes software instructions enabled while executing on the host computing platform to define at least two network segments in the CU, load for the network segments a reinforcement learning policy that includes an actor policy and a critical. computer program instructions, moreover, they continuously identify in different periods a state of each of the network segments, provide the identified state to the reinforcement learning policy and receive, from the reinforcement learning policy, during a contemporaneous period of the periods, an operation generated scaling.
[0037] [0016] Furthermore, the program instructions still apply the generated scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to one of the network segments and monitor a cost result in resources of the given scaling operation in the CU while comparing the monitored result with a predetermined optimal result for the given scaling operation. Furthermore, the program instructions still determine a statistical Q value in the critic model from a difference between the monitored result and the optimal result and calculate a gradient for the actor policy and the critic model considering the determined statistical Q value. . Finally, the program instructions apply the computed gradients, respectively, to the actor policy and the critic model for use in a given next scaling operation in a later period of the periods. In particular, the application of the calculated gradient for the actor policy occurs at a less frequent rate than an application of the calculated gradient for the critic model.
[0039] [0017] Other aspects of the invention will be partially described in the following description and partially will be obvious from the description, or may be learned by practice of the invention. Aspects of the invention will be realized and achieved by means of the elements and combinations specifically indicated in the appended claims. It is understood that both the foregoing general description and the following detailed description are illustrative and explanatory only and do not limit the invention as claimed.
[0041] BRIEF DESCRIPTION OF THE VARIOUS VIEWS OF THE DRAWINGS
[0043] [0018] The accompanying drawings, which are incorporated and made a part hereof, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. The embodiments illustrated herein are presently preferred, it being understood, however, that the invention is not limited to the particular arrangements and instruments shown, where:
[0045] [0019] Figure 1 is a schematic illustration of a C-RAN adapted for continuous network segmentation based on delayed DDPG; Y,
[0047] [0020] Figure 2 is a flowchart illustrating a process for C-RAN architecture for continuous network segmentation based on delayed DDPG.
[0049] DETAILED DESCRIPTION OF THE INVENTION
[0051] [0021] Embodiments of the invention provide a C-RAN architecture for continuous network segmentation based on delayed DDPG. The C-RAN includes a host computing platform arranged in a CU of a mobile communication network with 5G network architecture. A DDPG-based continuous network segmentation module runs in platform memory and, during execution, defines two network segments in the CU. The module also loads, for each of the network segments, a reinforcement learning policy that includes a contemporaneous actor policy, which takes the state of one of the network segments as input and delivers, as output, a scaling operation. determined by allocating different computing resources to corresponding VNFs in the CU for a corresponding network segment of the network segments based on the identified state of the one of the network segments. The reinforcement learning policy also includes a critical model that takes the state of a corresponding network segment of network segments in combination with the given scaling operation as input and outputs a statistical Q value as output.
[0053] [0022] Once the network segments have been defined and the policies loaded, the module identifies, continuously, in different periods, a status of each of the network segments, provides the identified status to the actor policy and receives, of the actor policy, during a contemporary period of the periods, a generated scaling operation. The module then applies the generated scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to one of the network segments. The module then monitors a resource cost result of the given scaling operation in the CU while comparing the monitored result with a predetermined optimal result for the given scaling operation.
[0055] [0023] Finally, the module determines a statistical Q value in the critical model from a difference between the monitored result and the optimal result and calculates a gradient for the actor policy and the critical model considering the statistical Q value determined.
[0056] The module then applies the computed gradients to each of the respective actor policy and respective critic model for use in a given next scaling operation in a later period of the periods. It should be noted, however, that the module only updates the contemporary actor policy with the corresponding computed gradient at a rate less frequent than the application of each computed gradient to the critic model. In this way, delayed DDPG-based continuous network segmentation can be achieved and result in a significant performance improvement over a traditional dynamic reinforcement learning-based network segmentation process. Specifically, delayed DDPG addresses the drawbacks of DDPG by focusing on reducing the overestimation bias seen in previous algorithms by adding 3 key features:
[0058] • Trimmed double Q learning with a couple of critical models
[0060] • Delayed policy updates and target models
[0062] • Target policy smoothing and noise regularization
[0064] [0024] As a further illustration, Figure 1 is a schematic illustration of a C-RAN adapted for continuous network segmentation based on delayed DDPG. As shown in FIG. 1, a C-RAN 130 may be implemented to include a host computing platform 100 that includes one or more computers 110, each having memory 140 and one or more processors 120. In memory 140, multiple Different CUs 150 for respective network segments 170, each including one or more VNFs 160 to enable processing of 5G mobile network connections with different UEs 190 through multiple DUs 180. delayed DDPG network 200 in memory 140 and is executed by at least one of the processors 120 of the host computing platform 100.
[0066] [0025] The delayed DDPG network segmentation module 200 includes computer program instructions that, when executed in memory 140, receive a status from one of the network segments 170 and provide the status to a policy actor 115A. Actor policy 115 returns a scaling operation that includes adding more CPUs 120 to a corresponding CU of CUs 150 for network segment 170, or removing one or more CPUs 120 from a corresponding CU of CUs 150. for network segment 170. The program instructions then monitor a result of the scaled and compare the result with an optimal result. The comparison is provided to the critical model pair 135A, 145A along with the scaling operation, such that the critical models each produce a corresponding statistical Q value which is then combined with a minimize operation .
[0068] [0026] The program instructions then provide the combined Q value in a gradient to the actor policy 115A and critic models 135A, 145A and the gradients of the actor policy 115A and critic models 135A, 145A are they apply, respectively, to a corresponding actor target 115B and a pair of critical targets 135B, 145B. Finally, the program instructions update the critic models 135A, 145A with the critic targets 135B, 145B and the program instructions update the actor policy 115A with the actor target 115B, but the program instructions perform the update of actor policy 115A at a less frequent rate than a critical model update rate 135A, 145A. In this way, the delayed DDPG can be achieved in the C-RAN 130 of the mobile telecommunications network with 5G network architecture.
[0070] [0027] As will be recognized, the program instructions of the delayed DDPG network segmentation module 200 are intended to adjust the parameters 9 of the actor policy 115A in the direction of a performance gradient V9J(n9). The yield gradient as it should be applied to actor policy 115A can be mathematically reflected as follows:
[0072]
[0075] which is equal to
[0080] The actor policy 115A may be parameterized as a value function with the goal of finding the optimal policy n9 where 9 includes the update weight of the actor policy 115A. The expected return can be approximated in many ways. In an example, the expected return gradient can be calculated according to parameters of 9 as V9J(9). As can be seen, gradient ascent is preferred over gradient descent for updating the parameters, 9t+i = 9t aV9J(^9)|9t.
[0081] [0028] In the critical actor method of Figure 1, two models work at the same time, where the actor policy 115A is a policy that takes state as input and delivers actions as output, while the critical models 135A, 145A each take concatenated states and actions and return the value Q, such that actor policy 115A can be updated through the deterministic policy gradient,
[0086] where
[0088] Q7r ( s1a ) = E S f 7T ~ a f ~ 7 T [# í|s ,a ]
[0090] is the statistical Q value, also known as the critical or value function.
[0092] [0029] More specifically, a random experience is initially stored in the buffer p. In other words, (st, at, rt, st+1) is stored for the purpose of training a deep Q network. Next, a random batch B is selected in the buffer p and, for all transitions (stB , atB , rtB , stB+1) of p, the predictions are Q(stB , atB) and the targets considered as optimal immediate return that are exactly the first part of the time difference (TD) learning error as R(stB , atB ) Ymaxa(Q(stB+i,a)). The loss between the predictions and the targets in batch B can be computed for the entire batch B. Preferably, another target model is used instead of using the Q network to compute the target in order to achieve more stability for the learning algorithm . As will be recognized below, the TD process is based on the critical actor model, while making use of three additional processes in order to improve the TD algorithm:
[0094] [0030] (1) Clipped double Q learning with a couple of critical models:
[0096] [0031] The first additional process uses two deep neural networks (DNNs) as the two actor models 115A, 115B and are indicated by 9 as a DNN for actor policy 115A and 9' as a DNN. for actor target 115B. Also, two pairs of DNNs are provided for critical models 135A, 145A and critical targets 135B, 145B and are denoted 0i, 02 for parameterization of a value network as critical models 135A, 145A and 0'i. , 0.2 as critical targets 135B, 145B. Therefore, two machine learnings occur at the same time, namely Q learning and policy learning, and the combination addresses approximation error, reduction of bias and the finding of the highest statistical Q value. For each element and transition in the batch, actor target 115B executes a' based on s' while Gaussian noise is added to a'. Crit targets 145A, 145B take the pair (s', a') and return two Q values Q'ti and Q't2 as output. Next, the (min Q'ti,Q't2) as a combination of statistical Q values is considered as an approximate value for the DNNs of the critical targets 145A, 145B.
[0098] [0032] The DNNs for critical targets 145A, 145B are used to provide the value estimates through a combination of Q statistic values produced as follows:
[0100] Qt = r + 7 *min(Q'1,Q '2)
[0102] Consequently, the DNNs for the two critical models 135A, 135B return two Q values as Q1(s,a) and Q2(s,a). The loss can then be calculated from two critical models 135A, 135B and with the mean square error (MSE). In order to minimize loss during iterations using the backpropagation technique, an efficient optimizer known as adaptive estimation of moments (Adam) can be used:
[0104] L = I mse { Q ii Q t ) + I mse ( Q 2 , Q t )
[0106] V ( ¡>J (</>) = N ~ 1 ^ 2 [V aQe1(s ,a ) |a = 7 r(0 )V ^ (s ) ]
[0108] [0033] (2) Delayed policy updates and target models:
[0110] [0034] The second additional process allows for a delayed update of actor policy 115A. Specifically, the actor policy DNN 115A is updated less frequently than the DNNs for the critic models 135A, 135B in order to estimate values with lower variance. The update rule is provided by Polyak Averaging, in order to update the parameters by:
[0112]
[0115] where t < 1 is a hyperparameter to adjust the speed of the update.
[0116] [0035] (3) Target policy smoothing and noise regularization:
[0118] [0036] The third additional process serves to smooth the actor target 115B and the critic targets 135B, 145B. In this regard, when the critical models 135A, 145A are updated, a learning target 135B, 145B using a deterministic policy is highly prone to inaccuracies caused by function approximation error, so that the variance of the target increases. This induced variance is reduced through regularization to make it safe for exploration of all possible continuous parameters. To this end, Gaussian noise is added to the following action a' to prevent two large actions from disturbing the state of the environment:
[0123] where the noise s is selected from a Gaussian distribution with some or no standard deviation and clipped to a given value range between -c and c to stimulate scanning. To avoid the error of using an impossible action value, the added noise is clipped to the range of possible actions (min_action, max_action).
[0125] [0037] The above TD3-based network segmentation method can be summarized as follows:
[0126] Initialize actor network 4> and critical networks 9 , 92
[0127] Initialize (copy parameters) target networks <f>', 9[, 92
[0128] Initialize replay buffer ( 3
[0129] Import custom gym environment (‘smartech-vO’)
[0130] while t < max_timesteps do
[0131] if t < start_timesteps tliell
[0132] | a = env.action_space.sample()
[0133] else
[0134] ai ---- 7r¿(s) c, e ~ Jf (0, a)
[0135] end
[0136] next_state. reward. donate. _ = env.step(a)
[0137] store the new transition ( st , at , rt , s t+ i) into ( 3
[0138] if t > start_timesteps then
[0139] sample batch of transitions ( s tB , at B , rt B ,s t B l )
[0140] ai ---- 7r¿/(s') €, c ~ clip ( Jf ( 0, <r), —c, c)
[0141] Q t = r 7 * m in íQ 'j, Q't2 )
[0142] L = i MSE ( Q h Q t ) + l MS e {Q2 , Q t)
[0143] 9¡ «---- argmin 0 f N - 1 £ ( L ~ Qof ( a,a))2
[0144] if t% policy req = = 0 then
[0147] in
[0148] end
[0149] if Jone then
[0150] | obs. done = env.reset(), False
[0151] end
[0152] t=t+l
[0153] end
[0155] [0038] By way of illustration and further summary of the network segmentation methodology
[0156] Based on TD3 above, Figure 2 is a flowchart illustrating a process for
[0157] C-RAN architecture for continuous network segmentation based on delayed DDPG.
[0158] Starting from block 210, a network segment is created in the CU as an environment and, in the
[0159] block 215, it is initialized in memory for the network segment. At block 220, load
[0160] an actor policy and an actor target in memory for a network segment
[0161] selected. Next, at block 225, two critical and two critical models are also loaded.
[0162] critical targets. Later, at block 230, the actions are executed randomly and,
[0163] at block 235, a batch of transitions is selected. At block 240, the actor target
[0164] takes the next state and executes a next action. At block 245, noise is added
[0165] Gaussian to the next action and the next action is blocked. At block 250, the targets of critical calculate Q values from the state and action and, in block 255, a minimum for the Q values is calculated. In block 260, an ultimate goal is determined with respect to a discount factor and, in block 265, the crit models receive the action and state, and return the Q values. In block 270, the crit loss is calculated with respect to the final target, and in block 275, the crit loss is backpropagated with the target. in order to update the parameters of the critical models. At decision block 280, it is determined whether the actor model update should be delayed. If so, at block 285, the actor model is updated with the output of the first critic model, and at block 290, the actor and critic target weights are updated.
[0167] [0039] Accordingly, in accordance with the present invention, a reward-penalty mechanism is provided in order to mitigate the negative impact of destabilized training. The reward-penalty mechanism clips the network values to some constant and constraint values related to Quality of Service (QoS) and other thresholds. Therefore, as will be recognized, the proposed technique applies robotic algorithms in the field of telecommunications. Likewise, repetition of experience is one of the main aspects of learning behaviors in biological systems. In this case, to speed up the training process and to improve learning efficiency, a score-based asynchronous actor-learner is optimized for the network segmentation environment.
[0169] [0040] The present invention may be embodied in a system, a method, a computer program product, or any combination thereof. The computer program product may include a computer-readable storage medium or media having computer-readable program instructions for causing a processor to carry out aspects of the present invention. The computer readable storage medium may be a tangible device capable of holding and storing instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination. adequate from the above.
[0171] [0041] The computer-readable program instructions described herein may be downloaded to respective computing/processing devices from a computer-readable storage medium or on an external computer or external storage device over a network. The machine-readable program instructions may be executed entirely on the user's computer, partially on the user's computer as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on the server or remote computer. Aspects of the present invention are described herein with respect to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products in accordance with embodiments of the invention. It will be understood that each block in the flowchart illustrations and/or block diagrams, as well as combinations of blocks in the flowchart illustrations and/or block diagrams, may be implemented by machine-readable program instructions. computer.
[0173] [0042] These computer-readable program instructions may be provided to a processor of a general-purpose computer, a specialized computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which are executed by by means of the computer processor or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block(s) of the flowchart and/or block diagram. These computer-readable program instructions may also be stored on a computer-readable storage medium that can command a computer, programmable data processing apparatus, and/or other devices to function in a particular way, such that the medium computer-readable storage device, having instructions stored therein, includes a manufactured product that includes instructions that implement aspects of the function/act specified in the block(s) of the flowchart and/or block diagram.
[0175] [0043] The computer-readable program instructions may also be loaded into a computer, other programmable data processing apparatus, or other device to cause a series of operating steps to be carried out in the computer, other programmable apparatus, or other device. in another device to produce a computer-implemented process such that instructions executed by the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart block(s), and /o of the block diagram.
[0176] [0044] The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and software products in accordance with various embodiments of the present invention. In this regard, each block of the flow chart or block diagrams may represent a module, a segment, or a portion of instructions, including one or more executable instructions to implement the specific logical function(s). In some alternative implementations, the functions annotated in the block may occur in a different order than indicated in the figures. For example, two blocks shown in succession may, in fact, be executed substantially simultaneously, or the blocks may sometimes be executed in the reverse order, depending on the functionality in question. Also, note that each block in the block diagrams and/or flowchart illustration, as well as combinations of blocks in the block diagrams and/or flowchart illustration, may be implemented by systems based on on specialized hardware that perform specific functions or acts or carry out combinations of computer instructions and specialized hardware .
[0178] [0045] Finally, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein, the singular forms "a", "an" and "the", "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Likewise, it will be understood that the terms "include", "include(s)" and/or "which includes(n)", when used herein, specify the presence of characteristics, integers, steps, operations, elements and/or components disclosed, but do not exclude the presence or addition of one or more other integers, features, steps, operations, elements, components and/or groups thereof.
[0180] [0046] The structures, materials, acts and equivalents of all corresponding means or stage plus function elements of the following claims are intended to include any structure, material or act to carry out the function in combination with other elements claimed, as specifically claimed. The description of the present invention has been set forth for purposes of illustration and description, but is not intended to be exhaustive or to limit the invention as set forth. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the invention. The method of implementation was chosen and described in order to better explain the principles of the invention and the practical application, as well as to allow other experts in the field to understand the invention with respect to various embodiments with various modifications, as appropriate for the specific use contemplated .
[0182] [0047] Having thus described the invention of the present application in detail and with reference to embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims as follows :
权利要求:
Claims (18)
[1]
1. Method for continuous network segmentation, the method comprising:
define at least two network segments in a central unit (CU) of a mobile communication network with 5G network architecture;
load for the network segments a reinforcement learning policy comprising an actor policy that takes the state of one of the network segments as input and outputs a given scaling operation by allocating different computing resources to virtual network functions ( VNF) corresponding in the CU for one of the network segments from the identified state of one of the network segments, the reinforcement learning policy additionally comprising a critical model that takes the state of one of the network segments in combination with the determined scaling operation as input and returns a statistical Q value as output for use in actor policy reinforcement learning;
continuously, at different periods, identifying a state of each of the network segments, providing the identified state to the actor policy, and receiving, from the actor policy, during a contemporaneous one of the periods, a generated scaling operation;
apply the generated scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to one of the network segments;
monitoring a resource cost result of the determined scaling operation in the CU and comparing the monitored result with a predetermined optimal result for the determined scaling operation;
determining a statistical Q value in the critic model from a difference between the monitored result and the optimal result and calculating a gradient for the actor policy and the critic model considering the determined statistical Q value; Y
apply the computed gradients to each of the respective actor policy and respective critical model for use in a given next scaling operation in a later period of the periods, but applying one of the computed gradients corresponding to the actor policy at a less frequent rate than an application of the another of the gradients calculated to the critical model.
[2]
2. Method according to claim 1, wherein the determined scaling operation is determined taking into account a state space of one of the network segments comprising a number of new user equipment (UE) connections to one of the segments network, computing resources assigned to each of the VNFs in the CU for the one of the network segments, a state of delay with respect to the latency cost for each of the at least two network segments, a power state with respect to an energy cost for the use of computing resources by each of the at least two network segments, a number of users served in each of the at least two network segments and a number of VNF particularizations in each of the at least two network segments.
[3]
3. Method according to claim 1, wherein the scaling operation is part of a vertical scaling action space that includes scaling to a larger capacity in one of the network segments and scaling to a smaller capacity in one of the network segments. the network segments.
[4]
4. Method according to claim 1, wherein the optimal result comprises a maximized inverse of a total network cost of the monitored result in the contemporaneous period of the periods.
[5]
5. Method according to claim 1, wherein the reinforcement learning policy comprises twins of the critic model.
[6]
6. Method according to claim 6, wherein the Q value used in the calculation of the gradient for the actor policy is a minimum of Q values provided by each of the first of the twins.
[7]
7. Cloud Radio Access Network (C-RAN) architecture data processing system adapted for continuous network segmentation, the system comprising:
a host computing platform arranged in a central unit (CU) of a mobile communications network with 5G network architecture, the CU comprising a communicative coupling to a multiplicity of different distributed units (DU), at least one of the DUs comprising an antenna multiple-input, massive multiple-output (MIMO) systems transmitting over millimeter-wave frequencies, the platform comprising one or various computers, each comprising memory and at least one processor; Y
a DDPG-based continuous network segmentation module comprising computer program instructions enabled while executing on the host computing platform to perform:
define at least two network segments in the CU;
load for the network segments a reinforcement learning policy comprising an actor policy that takes the state of one of the network segments as input and outputs a given scaling operation by allocating different computing resources to virtual network functions ( VNF) corresponding in the CU for one of the network segments from the identified state of one of the network segments, the reinforcement learning policy additionally comprising a critical model that takes the state of one of the network segments in combination with the determined scaling operation as input and output a statistical Q value as output;
continuously, at different periods, identify a state of each of the network segments, provide the identified state to the reinforcement learning policy, and receive, from the reinforcement learning policy, during a contemporaneous period of the periods, an operation generated scaling;
apply the generated scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to one of the network segments;
monitoring a resource cost result of the determined scaling operation in the CU and comparing the monitored result with a predetermined optimal result for the determined scaling operation;
determining a statistical Q value in the critic model from a difference between the monitored result and the optimal result and calculating a gradient for the actor policy and the critic model considering the determined statistical Q value; Y
apply the computed gradients to each of the respective actor policy and respective critic model for use in a given scaling operation following in a later period of the periods, but applying one of the calculated gradients corresponding to the actor policy at a less frequent rate than an application of the other of the calculated gradients to the critic model.
[8]
8. System according to claim 7, wherein the determined scaling operation is determined taking into account a state space of one of the network segments comprising a number of new user equipment (UE) connections to one of the segments network, calculating resources allocated to each of the VNFs in the CU for the one of the network segments, a delay state with respect to the latency cost for each of the at least two network segments, a power state with respect to an energy cost for the use of computing resources by each of the at least two network segments, a number of users served in each of the at least two network segments and a number of VNF particularizations in each of the at least two network segments.
[9]
9. System according to claim 7, wherein the scaling operation is part of a vertical scaling action space that includes scaling to a larger capacity in one of the network segments and scaling to a smaller capacity in one of the network segments. the network segments.
[10]
10. System according to claim 7, wherein the optimal result comprises a maximized inverse of a total network cost of the monitored result in the contemporaneous period of the periods.
[11]
11. System according to claim 7, wherein the reinforcement learning policy comprises twins of the critic model.
[12]
12. System according to claim 11, wherein the Q value used in the calculation of the gradient for the actor policy is a minimum of Q values provided by each of the first of the twins.
[13]
13. Computer program product for continuous network segmentation, the computer program product including a computer-readable storage medium having program instructions embodied therein, the program instructions being executable by a device to render the device carry out a method, which includes:
define at least two network segments in a central unit (CU) of a network of mobile communications with 5G network architecture;
load for the network segments a reinforcement learning policy comprising an actor policy that takes the state of one of the network segments as input and outputs a given scaling operation by allocating different computing resources to virtual network functions ( VNF) corresponding in the CU for one of the network segments from the identified state of one of the network segments, the reinforcement learning policy additionally comprising a critical model that takes the state of one of the network segments in combination with the determined scaling operation as input and output a statistical Q value as output;
continuously, at different periods, identify a state of each of the network segments, provide the identified state to the reinforcement learning policy, and receive, from the reinforcement learning policy, during a contemporaneous period of the periods, an operation generated scaling;
apply the generated scaling operation to the CU by allocating the different computing resources to the corresponding VNFs in the CU to one of the network segments;
monitoring a resource cost result of the determined scaling operation in the CU and comparing the monitored result with a predetermined optimal result for the determined scaling operation;
determining a statistical Q value in the critic model from a difference between the monitored result and the optimal result and calculating a gradient for the actor policy and the critic model considering the determined statistical Q value; Y
apply the computed gradients to each of the respective actor policy and respective critical model for use in a given next scaling operation in a later period of the periods, but applying one of the computed gradients corresponding to the actor policy at a rate less frequent than an application of the other of the calculated gradients to the critical model.
[14]
14. Computer program product according to claim 13, wherein the determined scaling operation is determined taking into account a state space of the one of the network segments comprising a number of new user equipment (UE) connections to the one of the network segments, computing resources allocated to each of the VNFs in the CU for the one of the network segments, a delay state with respect to the latency cost for each of the at least two network segments, a power state with respect to a power cost for the use of the computing resources for each of the at least two network segments, a number of users served on each of the at least two network segments, and a number of VNF instances on each of the at least two network segments.
[15]
15. Computer program product according to claim 13, wherein the scaling operation is part of a vertical scaling action space that includes scaling to a larger capacity in one of the network segments and scaling to a smaller capacity in one of the network segments.
[16]
16. Computer program product according to claim 13, wherein the optimal result comprises a maximized inverse of a total network cost of the monitored result in the contemporaneous period of periods.
[17]
17. Computer program product according to claim 13, wherein the reinforcement learning policy comprises twins of the critic model.
[18]
18. Computer program product according to claim 17, wherein the Q value used in the calculation of the gradient for the actor policy is a minimum of Q values provided by each of the first of the twins.
类似技术:
公开号 | 公开日 | 专利标题
Fan et al.2018|Workload allocation in hierarchical cloudlet networks
US20160248850A1|2016-08-25|Techniques for Mobility-Aware Dynamic Service Placement in Mobile Clouds
Al-Tam et al.2019|Fractional switch migration in multi-controller software-defined networking
WO2018176385A1|2018-10-04|System and method for network slicing for service-oriented networks
US10887172B2|2021-01-05|Network function virtualization
JP6380110B2|2018-08-29|Resource control system, control pattern generation device, control device, resource control method, and program
Messaoudi et al.2017|On using edge computing for computation offloading in mobile network
Chen et al.2019|Mobility-aware service function chaining in 5g wireless networks with mobile edge computing
CN110995858A|2020-04-10|Edge network request scheduling decision method based on deep Q network
CN113169990A|2021-07-23|Segmentation of deep learning inference with dynamic offload
Javadpour2019|Improving resources management in network virtualization by utilizing a software-based network
Liao et al.2019|AI-based software-defined virtual network function scheduling with delay optimization
Lee et al.2017|Online optimization for low-latency computational caching in fog networks
Abreu et al.2018|A rank scheduling mechanism for fog environments
ES2889699A1|2022-01-12|Continuous Network Segmentation in a 5G Mobile Communications Network Through a Delayed Deep Deterministic Policy Gradient |
Bosque et al.2013|A load index and load balancing algorithm for heterogeneous clusters
Nguyen et al.2019|An algorithm for improved proportional-fair utility for vehicular users
Wille et al.2011|Discrete capacity assignment in IP networks using particle swarm optimization
Zhang et al.2020|An online learning-based task offloading framework for 5G small cell networks
Inoue et al.2018|Noise-induced VNE method for software-defined infrastructure with uncertain delay behaviors
US20200233724A1|2020-07-23|Workload placement in a cluster computing environment using machine learning
US20190108060A1|2019-04-11|Mobile resource scheduler
Kim et al.2021|Multi-agent reinforcement learning-based resource management for end-to-end network slicing
CN106717084B|2020-02-14|Apparatus and method for overlapping rate partition
Garg et al.2021|Heuristic and Reinforcement Learning Algorithms for Dynamic Service Placement on Mobile Edge Cloud
同族专利:
公开号 | 公开日
DE102021116590A1|2021-12-30|
GR1010062B|2021-08-04|
GB202108215D0|2021-07-21|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB2577055B|2018-09-11|2021-09-01|Samsung Electronics Co Ltd|Improvements in and relating to telecommunication networks|
法律状态:
2022-01-12| BA2A| Patent application published|Ref document number: 2889699 Country of ref document: ES Kind code of ref document: A1 Effective date: 20220112 |
优先权:
申请号 | 申请日 | 专利标题
GR20200100374A|GR1010062B|2020-06-29|2020-06-29|Continuous network slicing in a 5g cellular communications network via a delayed deep deterministic policy gradient|
[返回顶部]